Generalized Multi-Linear Principal Component Analysis of Binary Tensors

نویسندگان

  • Jakub Mažgút
  • Peter Tiňo
  • Mikael Bodén
  • Hong Yan
چکیده

Current data processing tasks often involve manipulation of multi-dimensional objects tensors. In many real world applications such as gait recognition, document analysis or graph mining (with graphs represented by adjacency tensors), the tensors can be constrained to binary values only. To the best of our knowledge at present there is no principled systematic framework for decomposition of binary tensors. To close this gap we propose a generalized multi-linear model for principal component analysis of binary tensors (GML-PCA). In the model formulation, to account for binary nature of the data, each tensor element is modeled by a Bernoulli noise distribution. To extract the dominant trends in the data, we constrain the natural parameters of the Bernoulli distributions to lie in a sub-space spanned by a reduced set of basis (principal) tensors. Bernoulli distribution is a member of exponential family with helpful analytical properties that allow us to derive a an iterative scheme for estimation of the basis tensors and other model parameters via maximum likelihood. We evaluate and compare the proposed GML-PCA technique with an existing real-valued tensor decomposition method (TensorLSI) in two scenarios: (1) in a series of controlled experiments involving synthetic data; (2) on a real world biological dataset of DNA sub-sequences from different functional regions, with sequences represented by binary tensors. The experiments suggest that the GML-PCA model is better suited for modeling binary tensors than its real-valued counterpart TensorLSI model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Principal Component Analysis: Projection of Saturated Model Parameters

Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. We generalize PCA to handle various types of data using the generalized linear model framework. In contrast to the existing approach of matrix factorizatio...

متن کامل

Classification of Multi-Frequency Polarimetric SAR Images Based on Multi-Linear Subspace Learning of Tensor Objects

One key problem for the classification of multi-frequency polarimetric SAR images is to extract target features simultaneously in the aspects of frequency, polarization and spatial texture. This paper proposes a new classification method for multi-frequency polarimetric SAR data based on tensor representation and multi-linear subspace learning (MLS). Firstly, each cell of the SAR images is repr...

متن کامل

A Generalization of Principal Component Analysis to the Exponential Family

Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that w...

متن کامل

A Generalized Linear Model for Principal Component Analysis of Binary Data

We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the...

متن کامل

Generalized Multilinear Model for Dimensionality Reduction of Binary Tensors

Generalized multilinear model for dimensionality reduction of binary tensors (GMM-DR-BT) is a technique for computing low-rank approximations of multi-dimensional data objects, tensors. The model exposes a latent structure that represents dominant trends in the binary tensorial data while retaining as much information as possible. Recently, there exist several models for computing the low-rank ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010